-
Notifications
You must be signed in to change notification settings - Fork 65
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve NFS home directory restore documentation #5291
base: main
Are you sure you want to change the base?
Conversation
Also adds the option to mount an EBS volume to a pod through the deployer exec root-homes command.
This comment was marked as off-topic.
This comment was marked as off-topic.
@@ -25,18 +25,24 @@ To restore a home directory from a snapshot, we need to create a new EBS volume | |||
Please follow AWS's guidance for [restoring EBS volumes from a snapshot](https://docs.aws.amazon.com/prescriptive-guidance/latest/backup-recovery/restore.html#restore-files) to create a new EBS volume from the snapshot. | |||
``` | |||
|
|||
Once we have created a new EBS volume from the snapshot, we need to mount it to the EC2 instance attached to the existing EBS volume containing the NFS home directories. To do this, we need to find the instance ID of the EC2 instance attached to the existing EBS volume. This involves the following steps: | |||
Once we have created a new EBS volume from the snapshot, we can use the `deployer exec` command to mount the new EBS volume to a pod along with the existing NFS home directories volume. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Once we have created a new EBS volume from the snapshot, we can use the `deployer exec` command to mount the new EBS volume to a pod along with the existing NFS home directories volume. | |
Once we have created a new EBS volume from the snapshot, we can use the `deployer exec root-homes` command to mount the new EBS volume to a pod along with the existing NFS home directories volume. |
docs/howto/filesystem-management/filesystem-backups/restore-filesystem.md
Outdated
Show resolved
Hide resolved
for more information, see https://pre-commit.ci
@sgibson91 I did another test run to test the "wait for the pod to be ready" command that we've added. Everything worked as expected. I can merge this once you give your approval. |
I've pinged Yuvi on this too, because I want to make sure his concerns that caused him to reopen #5061 are addressed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is great! Some minor suggested changes but otherwise this looks pretty close.
While not necessarily done as part of this PR, I do think it's important for us to fully end to end test the restore process at least once before calling it done (see pt 7 of https://mastodon.social/@nixCraft/113671487670162948). So let's make sure that at least one person tries all these steps top to bottom once, and verifies the completeness of the whole process.
Exciting to have this be so close to done!
@@ -37,6 +37,15 @@ def root_homes( | |||
extra_nfs_mount_path: str = typer.Option( | |||
None, help="Mount point for the extra NFS share" | |||
), | |||
restore_volume_id: str = typer.Option( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's call this additional_ebs_volume_id
or something like that, as the code itself doesn't do anything to do with restoring - just mounting additional ebs volumes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we use additional_volume_id
to keep it platform agnostic? I can add in the help text that only EBS volumes are supported for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That sounds reasonable to me for now. Of course, we can always update things as we begin work on other platforms.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ya i leave it to you both for now!
help="ID of volume to mount (e.g. vol-1234567890abcdef0) " | ||
"to restore data from", | ||
), | ||
restore_mount_path: str = typer.Option( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See comment on restore_volume_id about using 'restore' in the name. I'd say this could be additional_ebs_volume_mount_path
?
"kind": "PersistentVolume", | ||
"metadata": {"name": pv_name}, | ||
"spec": { | ||
"capacity": {"storage": restore_volume_size}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this necessary for us to specify? I worry that if we get this wrong, it will trigger the kubernetes volume resizing feature
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suspect storage capacity is a required field. I'll try to confirm whether that's required or not.
Reading this warning note, looks like volume expansion is triggered only when the storage capacity is updated for an existing PVC that is already bound to a PV creating a mismatch between the specified storage capacity of the PVC and the bound PV.
I will confirm what happens when the specified volume on the PV and the PVC is larger than the actual size of the disk and ALLOWVOLUMEEXPANSION
is enabled on the storageclass.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did some testing and storage capacity is indeed a required field.
I also tried passing a larger size than the actual disk size as the argument to see if that triggers a resize. The PV and PVC were labeled with the wrong capacity but the underlying disk was not resized.
@yuvipanda How would you like to move forward?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sunu ah, if it doesn't trigger a resize I would say:
- Hard code an intentionally wrong number here (as
df
and other commands we use don't rely on it, instead using other more linux filesystem level calls) - Write an inline comment explaining what is going on, and that it doesn't trigger resize
This would allow us to not have the engineer doing this need to know the filesize.
"accessModes": ["ReadWriteOnce"], | ||
"volumeName": pv_name, | ||
"storageClassName": "", | ||
"resources": {"requests": {"storage": restore_volume_size}}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See comment about specifying sizes
restore_mount_path: str = typer.Option( | ||
"/restore-volume", help="Mount point for the volume" | ||
), | ||
restore_volume_size: str = typer.Option("100Gi", help="Size of the volume"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See comment later about specifying the size
@@ -140,6 +200,17 @@ def root_homes( | |||
tmpf.flush() | |||
|
|||
with cluster.auth(): | |||
# If we have an volume to restore from, create PV and PVC first | |||
if restore_volume_id: | |||
with tempfile.NamedTemporaryFile(mode="w", suffix=".json") as pv_tmpf: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is being created outside the try
whose finally
block contains the cleanup code, so it's possible that errors here will lead to the PV not being cleaned up. I think this hsould all be moved inside the same try
whose finally
has the cleanup code
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, good catch, thank you!
So I did actually do this in a meeting with Tarashish the other day. I created a volume from a snapshot, mounted it to a pod using the We definitely need to spread all of this knowledge out beyond just me and Tarashish though. |
That's excellent, @sgibson91! I think once #5291 (comment) is addressed, this can be merged. |
Remove the mentions of "restore". Rename "extra" volume to "additional" volume.
and also document why we are hardcoding a wrong value intentionally.
Also adds the option to mount an EBS volume to a pod through the
deployer exec root-homes
command.Follow up for #5286. Related to #5061.
@yuvipanda @sgibson91 - This is still a draft and I haven't really tested this but does the overall approach look ok?